Introduction

Rice (Oryza sativa L.) is the most widely grown crop, constituting a staple food for over half of the global population (Lu et al. 2009; Wang and Li 2011). Asian cultivated rice consists of the subspecies japonica and indica (Chen et al. 1992; Cheng et al. 2003). Research on the genetic differentiation of indica and japonica rice has resulted in the production of valuable germplasms and important information relevant to crossbreeding (Ikehashi 2009; Lu et al. 2009; Jyothi et al. 2018; Zhang 2020). Comparative studies of wild, indica, and japonica rice indicate that wild rice is genetically undifferentiated from indica and japonica rice (Cheng et al. 2003; Shen et al. 2004; Song et al. 2006), and it was proposed that japonica rice was first domesticated from wild rice and that the indica subspecies was subsequently developed from crosses between japonica and local wild rice (Huang et al. 2012).

China is one of the main centers of origin and differentiation for Asian cultivated rice, and consequently, harbors abundant genetic rice resources (Zong et al. 2007; Fuller et al. 2009; Huang et al. 2010). Qiandongnan (26.5834° N, 107.9829° E) is a typical mountainous region (altitude: 1000–2000 m) located on the eastern slope of the Yunnan–Guizhou Plateau in China. Qiandongnan rice is cultivated under relatively isolated conditions in accordance with the local specific environment and agricultural practices and is called “He” and “Gu” by the locals. These landraces have been the staple food for the local Miao and Dong populations and are grown in terraced fields on hills and river valleys. These cultivars possess several favorable agronomic traits, including a desirable awn, grain, and hull color, grain shape, plant height, and growth period length. Thus, cultivated landraces in this region represent ideal models for studying the evolution of rice in mountainous terrains. Identifying the genetic basis for these diverse varieties may provide important insights into the breeding of elite rice varieties for sustainable agricultural production. However, research involving Qiandongnan rice landraces is limited.

A genome-wide association study (GWAS) based on molecular markers, such as single nucleotide polymorphisms (SNPs), copy number variations, and simple sequence repeats, can be used to analyze gene clustering and quantitative trait loci. Many sequenced plants, including Arabidopsis thaliana (Togninalli et al. 2020), rice (Huang et al. 2010, 2012; Li et al. 2018; Yuan et al. 2020) and wheat (Ain et al. 2015; Bellucciet al. 2015; Beyer et al. 2019), have been extensively analyzed by GWAS.

Specific-locus amplified fragment sequencing (SLAF-seq), which is based on second-generation sequencing technology, can effectively generate genome-wide high-density markers and efficiently simplify the genome (Sun et al. 2013). It has been used to develop high-density molecular markers in rice (Jiang et al. 2018; Yang et al. 2020), peanut (Hu et al. 2018), Thinopyrum ponticum (Liu et al. 2018), cotton (Chen et al. 2014), maize (Xia et al. 2014), cauliflower (Zhao et al. 2016), soybean (Yang et al. 2020) and other plants.

In this study, SLAF-seq analysis was conducted on 60 Qiandongnan rice landraces. A GWAS analysis involving eight agronomic traits was then conducted to identify potential loci associated with rice production and improvement. We identified three loci (signals peak) that were closely related to previously reported genes or gene regions (Matsushita et al. 2003; Abe et al. 2010; Shao et al. 2010; Toriba and Hirano 2014; Tetsuo et al. 2015; Li et al. 2016). Our results provide new details regarding the evolution of rice and the detection of genes responsible for complex traits in Qiandongnan rice landraces.

Materials and Methods

Plant materials and growth conditions

A total of 60 Qiandongnan rice landraces and two Oryza rufipogon Griff. wild rice accessions (Supplementary Table S1) were selected from the rice collection maintained at the Crop Germplasm Resources Centre at the Institute of Agro-Bioengineering of Guizhou University, Guizhou, China. We referred to germplasm database records for details regarding phenotypic variations and geographic origins. For each accession, 30–50 seedlings were cultivated under natural field conditions during normal rice-growing seasons at the Experimental and Demonstration Base for Transgenic Plants at the Agricultural Bioengineering Research Institute (Guiyang, China).

For each landrace, 10 plants were randomly selected. The agronomic traits such as plant height, panicle length, and flag leaf length, width, and angle were calculated in the mature period. The flag leaf length, width, and angle were measured on the main tiller. Awn length and grain length and width were measured using a caliper. Pericarp and awn colors were also observed. The growth period was recorded as the number of days from sowing to fully ripened.

DNA isolation

For each landrace, 20–30 rice seedlings were planted in pots at the Agricultural Bioengineering Research Institute. Total genomic DNA of 20-day-old seedling leaves was extracted according to the cetyl trimethylammonium bromide (CTAB) method (Doyle 1990). The isolated DNA was quantified using a NanoDrop 2000 spectrophotometer (Thermo Scientific, USA).

SLAF-seq library construction and high-throughput sequencing

The SLAF-seq procedure was performed as described (Sun et al. 2013), with minor modifications. Briefly, a pilot SLAF-seq experiment was conducted to establish the optimal yield conditions to prevent SLAF duplication and produce an even distribution of SLAFs for maximum efficiency. Based on the pilot experiment results, the SLAF library was constructed as follows. Genomic DNA was first incubated at 37°C with RsaI [New England Biolabs (NEB), Ipswich, MA, USA], T4 DNA ligase (NEB), ATP (NEB), and RsaI adapter. Restriction enzyme digestion/ligation reactions were heat-inactivated at 65°C and samples were digested with EcoRI and BfaI at 37°C. A polymerase chain reaction (PCR) was conducted using a solution consisting of diluted restriction enzyme digestion/ligation samples, dNTP, Taq DNA polymerase (NEB) and RsaI primer containing barcode 1 (forward primer sequence: 5′-AATGATACGGCGACCACCGA-3′; reverse primer sequence: 5′-CAAGCAGAAGACGGCATACG-3′). The following PCR amplification conditions were used: 98°C for 3 min; 18 cycles of 98°C for 10 s, 65°C for 30 s and 72°C for 30 s; 72°C for 5 min. The PCR products were purified using an E.Z.N.A. Cycle Pure Kit (Omega Bio-Tek, Norcross, GA, USA) and pooled. The pooled samples were incubated at 37°C with RsaI, T4 DNA ligase, ATP, and Solexa adapter. Samples were then purified using a Quick Spin column (Qiagen, Hilden, Germany) and analyzed on a 2% agarose gel. Fragments that were 350–450 bp or 500–600 bp (with indices and adaptors) were isolated using a Qiagen QIAquick Gel Extraction Kit (Qiagen). The fragments were then amplified using the Phusion Master Mix (NEB) and Solexa Amplification primer mix (Illumina, San Diego, CA, USA) according to the manufacturers’ recommended procedures. After the samples were gel purified, DNA bands (SLAFs) corresponding to 350–450 bp or 500–600 bp were excised and diluted for paired-end sequencing using a HiSeq 2500 sequencing platform (Illumina). Low-quality reads (Q < 20), reads with adaptor sequences, and duplicated reads were eliminated, and the remaining high-quality reads were used for mapping.

SNP calling and quality

The 125-bp raw read pairs were filtered and trimmed so that each mate was at least 80 bp, with a minimum Q_phred quality value > 20 over three consecutive bases. We used the Rice Genome Annotation Project reference genome (382 Mb; RGAP v. 7.0; http://rice.plantbiology.msu.edu/index.shtml) (Goff et al. 2002). The original reads for each sample were aligned with the reference genome sequence using SOAP software (Li et al. 2009a). Reads that were mapped to the same positions were classified into the same SLAF group. If the reads mapped to the reference genome overlapped with two SLAF tags, they were assigned to both groups (Gu et al. 2015). The number of SLAF markers per 100 kb was calculated to determine the SLAF marker distribution on each chromosome.

To identify high-quality SNPs, the GATK (McKenna et al. 2014) and Samtools (Li et al. 2009b) packages were used for SNP calling based on the defined SLAF Markers. The SNPs identified by GATK and Samtools were selected, and those with a genetic integrity ≥ 50% and a minor allele frequency ≥ 5% were used for further analysis. The SLAF markers with SNPs were considered polymorphic SLAF markers and were used to calculate the number of SNPs per 100 kb (i.e., SNP distribution on each chromosome).

Phylogenetic and population genetic analyses

The population structure of the 60 collections was determined with high-quality SNPs using the Admixture program (Alexander et al. 2009), which estimates individual ancestry and admixture proportions. The number of clusters (K) was predefined as 2–10, and the best K result was used for the Q matrix in the GWAS.

Neighbor-joining trees and principal component analysis (PCA) plots were used to characterize the population structure of the rice landraces. A phylogenetic tree consisting of 60 local landraces was generated according to neighbor-joining analyses using MEGA5 software (Saitou and Nei 1987; Tamura 2011). The PCA was conducted using cluster analysis software (Shaun et al. 2007) to clarify the clustering of the main component. Neighbor-joining trees were produced and PCA analysis was conducted based on SNPs.

Genome-wide association analysis

SNP markers were used to analyze the associations between genes and quantitative traits. The TASSEL program (Bradbury et al. 2007) was used to determine the association between each SNP and the corresponding trait, and association graphs were plotted with Admixture and SPAGeDi software (Hardy and Vekemans 2002).

The association between SNP markers and the corresponding trait (Supplementary Table S2) was determined using the simple linear model (GLM compressed) of the TASSEL software (Bradbury et al. 2007). As there were both japonica and indica rice groups in the population, it was not suitable to use the compressed MLM method. We used the following formula: Y=Xα+Qβ+Kµ+e, where Q represents the population structure calculated by Admixture, K represents the genetic relationship between samples calculated using SPAGeDi software (Hardy and Vekemans 2002), e represents the residual term, X corresponds to genotype, and y refers to phenotype. The P-value for each SNP was calculated. Gene annotations were based on the reference genome of the Rice Genome Annotation Project (382 Mb; RGAP v. 7.0; http://rice.plantbiology.msu.edu/index.shtml) (Goff et al. 2002).

Accession numbers

The gene sequence data from this article can be found in the Rice MSU Genome Annotation Release 7 under the following accession numbers: qSH1 gene (LOC_Os01g62920, Chr1: 36.44 M), An10 gene (RM237-RM265, Chr7: 28.57 M-Chr7: 36.95 M), An1 gene (LOC_Os04g28280.1, Chr4: 16.73 M), pre-awn1gene (LOC_Os07g35870.1, Chr7: 21.46 M), Kala4 gene (LOC_Os04g47059.1, chr4: 27.91 M), pre-anth1 gene (LOC_Os04g47040.1, Chr4: 27.88 M), pre-anth2 gene (LOC_Os04g47080.1, Chr4: 27.94 M), DEP2 gene (LOC_Os07g42410, chr7: 25.38 Mb), and qGL7-2 gene (Indel1-RM21945, chr7: 25.38 M-chr7: 25.62 M).

Results

SLAF-seq

Fig. 1: Distribution maps of SLAF markers and SNPs on rice chromosomes.

Fig. 2: Cluster map of Qiandongnan rice landraces

(a) Analyses of cross-validation errors of clustering with a hypothetical k-value of 1–10. The x-axis presents the k-value (1–10), while the y-axis presents the cross-validation error values

(b) Population structure of 60 landraces with a k-value of 2. Different colors represent individual landraces

(a) Distribution map of SLAF markers on rice chromosomes. Each line represents a chromosome. Bar colors correspond to the number of SLAF markers per 100 kb. The bar is black if there are more than 30 SLAFs per 100 kb

(b) Distribution map of SNPs on rice chromosomes. Each line represents a chromosome. Bar colors correspond to the number of SNPs per 100 kb. The bar is black if there are more than 300 SNPs per 100 kb

A total of 178,287,776 clean reads were obtained. For each landrace, 2,000,000–3,500,000 clean reads were generated (Supplementary Table S2). The base-calling accuracy (Q score > 20) was 85.98%. Based on the alignment results, 314,065 high-quality SLAFs were obtained, with a 3.20-fold average sequencing depth. The number of SLAFs on each chromosome is shown in Table 1. Among all the chromosomes, chromosome 1 contained the most SLAFs (37,489), while chromosome 9 had the least (19,069). The number of SLAFs per 100 kb was calculated, and the chromosomal distribution of the markers is presented in Fig. 1a. The overall SLAF marker density was 822.1 per Mb (Fig. 1a). This suggested that the SLAF markers were evenly distributed, and that individual SLAF segments were representative of the whole genome. Of the high-quality SLAFs, 165,922 were polymorphic, indicating that the average polymorphism rate was 52.83% (Table 3).

SNP markers

A total of 35,434,302 SNPs were detected with a genetic integrity of 80.67% based on the defined SLAF markers. The identified SNPs were aligned with the reference genome sequence. We obtained 571,521 high-quality SNP markers with an overall density of 1496.1 per Mb. Additionally, the number of SNP markers per 100 kb (Fig. 1b) revealed that the markers were evenly distributed on the chromosomes. In total, 2,302,820 heterozygous SNPs were detected and the heterozygosity rate for all SNPs was 8.06% (Table 2).

Evolutionary structure and analysis of seed shattering

According to the SNPs, all 60 landraces were divided into two populations, indica and japonica (Fig. 2). Among them, 51 landraces were japonica and belonged to the “He” rice group. The smaller indica group contained nine landraces, which were present in the “Gu” group (Fig. 2a).

The neighbor-joining algorithm was used to generate the phylogenetic tree. The phylogenetic tree revealed that the 60 Table 1: Statistics of SLAF markers on each rice chromosome

Chromosome	SLAF Num.
Chr01	37,489
Chr02	30,897
Chr03	32,621
Chr04	29,761
Chr05	25,796
Chr06	25,673
Chr07	24,458
Chr08	23,459
Chr09	19,069
Chr10	19,765
Chr11	22,615
Chr12	21,809
ChrSy	313
ChrUn	340
Total	314,065

Total is the number of SLAF markers for the entire genome

landraces were also divided into the same two groups. Fifty-one landraces formed the japonica group, while the remaining landraces were included in the indica group (Fig. 3a). The japonica and indica groups corresponded with the “He” and “Gu” groups, respectively. The “Hai nan ye jing” and “Hai nan ye xian” wild rice lines clustered in between the two groups. The “Hai nan ye jing” rice was more closely related to Nipponbare rice, while “Hai nan ye xian” rice was more related to 93-11 rice (Fig. 3a). The PCA clustering results were consistent with those obtained using the neighbor-joining algorithm (Fig. 3b) (Saitou and Nei 1987; Tamura 2011), which supported the classification of Qiandongnan rice landraces into japonica or indica groups.

The loss of seed shattering is a crucial event during the domestication of cereal crops (Saeko et al. 2006). The qSH1 gene was reported to have an important effect on the grain shattering of rice. In our study, we found that two SNPs (site Chr04: 364,459,91 and Chr04: 364,475,26) were located in the qSH1 gene, with the A and A bases at Chr04: 364,459,91 and Chr04: 364,475,26 present in the 51 He landraces (which was consistent with Nipponbare) and both G bases present in the nine Gu landraces (which was consistent with 93-11) (Table 4). Further, the He landraces were consistent with Nipponbare in their shattering character, with the seeds being resistant to shattering, while the Gu landraces were consistent with 93-11 in that their seeds shattered easily (Saeko et al. 2006; Zhang et al. 2009).

Table 2: SNP statistics

SNP Num.	Total Base	Total N	Integrity	Heter Num.	Hete Ratio
571,521	35,434,302	6,848,746	80.67%	2,302,820	8.06%

SNP Num. is the number of total detected SNPs; Total Base is the number of SNPs of all samples; Total N is the number of gene deletion sites; Integrity is the SNPs integrity; Heter Num. is the number of heterozygous SNPs of all samples; Hete Ratio is the heterozygosity rate for all SNPs

Table 3: Polymorphism statistics of the SLAF markers

SLAF Num.	Total Depth	Average Depth	SLAF Poly	Poly Ratio
314,065	62,384,746	3.20	165,922	52.83%

SLAF Num. is the number of all detected SLAF Markers; Total Depth is the number of reads of all SLAF Markers; Average Depth is the average depth of the SLAF markers in each sample; SLAF Poly is the number of polymorphic SLAF markers; Poly Ratio is the percentage of polymorphic SLAF markers

Table 4: SNP loci analysis of the qSH1 gene in rice landraces in the Qiandongnan area in Guizhou

Sample No.	SNP locus		Sample No.	SNPs locus
Sample No.	Chr1 36445991	Chr1 36447526	Sample No.	Chr1 36445991	Chr1 36447526
1	A	A	33	A	A
2	A	A	34	G	G
3	A	A	35	A	A
4	A	A	36	G	G
5	A	A	37	G	G
6	A	A	38	G	G
7	A	A	39	R	G
8	A	A	40	N	G
9	A	A	41	A	A
10	A	A	42	A	A
11	A	A	43	A	A
12	R	A	44	A	A
13	A	A	45	A	A
14	A	A	46	A	A
15	A	A	47	A	A
16	A	A	48	A	A
17	A	A	49	N	N
18	A	A	50	A	A
19	A	A	51	A	A
20	A	A	52	G	G
21	A	A	53	A	A
22	A	A	54	A	A
23	A	A	55	A	A
24	A	A	56	A	A
25	A	A	57	A	A
26	A	A	58	A	A
27	A	A	59	A	A
28	A	A	60	G	G
29	A	A	61	A	A
30	A	A	62	R	G
31	A	A	63	A	A
32	G	G	64	G	G

Samples 1 to 60 are the local rice landraces, and the samples with “gu” as the last word in their names are marked with bold font. The rest of the samples are “He”

Sample No. 61 is Hai nan ye xian, and sample No. 62 is hai nan ye jing

Sample No. 64 is 93-11 and No. 63 is Nipponbare

R represents A and G, and N is the omitted base in sequencing

Fig. 3: Phylogenetic tree and PCA of Qiandongnan rice landraces

(a) Phylogenetic tree of landraces. Each branch represents a landrace. Samples are also presented in Supplementary Table S1

(b) PCA of landraces. The rice landraces were clustered in a 2D graph by PCA. The x-axis presents the first principal component and the y-axis presents the second principal component. A dot corresponds to a rice sample, while different colors are used to indicate different groups. The samples are also presented in Supplementary Table S1. indica: 93-11; japonica: Nipponbare

Fig. 4: GWAS of awn length, grain color, and grain shape of the Qiandongnan rice landraces

(a) Manhattan plots of the GLM for awn length and quantile-quantile plot of the GLM for awn length. (b) Manhattan plots of the GLM for grain color and quantile-quantile plot of the compressed GLM for grain color

Negative log10-transformed P-values from a genome-wide scan are plotted against the positions on each of 12 chromosomes. The yellow horizontal dashed line corresponds to −log10 (p) = 5, which is the genome-wide significance threshold. The blue horizontal dashed line corresponds to −log 10 (p) = 8, and the blue arrow shows the significant association signal peaks

Analysis of quantitative trait associations

The SNP markers were used to analyze the associations between genes and quantitative traits in the Qiandongnan rice landraces (Hardy and Vekemans 2002; Bradbury et al. 2007). Eight genome-wide association graphs were plotted, including the traits of awn length, grain color, grain shape, awn color, flag leaf angle, flag leaf length, grain width, and growth period (Fig. 4 and Supplementary Fig. S1). Here, traits of awn length, grain color, and grain shape were considered as representative examples of the analysis. A total of 2,307 association signals were identified with a P-value < 10^-6, and among them we identified 858 strong association signals with a P < 10^-8 (Table 5). There were significant association signal peaks for awn length on chromosomes 1, 3, 7, and 11 (Fig. 4a, indicated by blue arrows), grain color on chromosomes 3, 4, 8, and 12 (Fig. 4b, indicated by blue arrows), and grain shape on chromosomes 7 and 8 (Fig. 4c, indicated by blue arrows). These association signal peak regions are given in Table 6. The gene annotation information was obtained from the Rice Genome Annotation Project 7.0, http://rice.plantbiology.msu.edu/cgi-bin/gbrowse/rice/.

Table 5: Number of association signals in each chromosome

Trait	Marker P-value	Num. of association signals
Trait	Marker P-value	Chr1	Chr2	Chr3	Chr4	Chr5	Chr6	Chr7	Chr8	Chr9	Chr10	Chr11	Chr12	total
Awn length	Marker p (10^-6)	22	3	12	12	14	15	11	74	4	1	313	7	488
Awn length	Marker p (10^-8)	5	1	2	0	0	0	21	0	0	0	99	0	128
Grain color	Marker p (10^-6)	161	92	378	172	74	92	93	120	67	80	61	382	1,772
Grain color	Marker p (10^-8)	47	25	237	78	22	26	31	33	30	37	20	130	716
Grain shape	Marker p (10^-6)	7	4	1	2	3	4	7	12	1	4	1	1	47
Grain shape	Marker p (10^-8)	3	1	0	1	2	1	2	1	1	2	0	0	14

Table 6: The regions of the signal peaks

Traits	Signal peak	Region	Marker P-value (10^-6)	Marker P-value (10^-8)	Annotated gene Num.
Awn length	Awn peak 1	Chr1: 34.57–34.66M	13	5	25
	Awn peak 2	Chr3: 27.87–27.97M	5	2	19
	Awn peak 3	Chr7: 21.36–21.61M	71	20	59
	Awn peak 4	Chr11: 21.24–22.97M	303	94	266
Grain color	Grain color peak 1	Chr3: 13.85–22.80M	312	225	1507
	Grain color peak 2	Chr4: 27.38–28.17M	11	9	159
	Grain color peak 3	Chr8: 2.89–3.08M	10	9	32
	Grain color peak 4	Chr12: 3.04–4.43M	46	36	265
	Grain color peak 5	Chr12: 19.40–21.32M	146	53	313
Grain shape	Grain shape peak 1	Chr7: 25.37–25.63M	4	0	53
	Grain shape peak 2	Chr8: 6.45–7.90M	10	0	115

Some of the association signals identified in this study were consistent with the gene or gene location regions reported in other studies. We identified an association signal with rice awn length on chromosome 1 at Chr1: 34.57–34.66 M (Table 6, with 13 associated markers). The results indicated that the region corresponded to the position of the An10 gene (Matsushita et al. 2003), which occurs between two SSR markers (RM237 and RM265) at position 28.57–36.95 Mb on the long arm of chromosome 1 (Matsushita et al. 2003) (Fig. 5a and Table 6). It was reported that An1 encodes a bHLH protein that is essential for rice awn development (Li et al. 2016). In the association peaks analysis for awn length traits, we detected a candidate gene for awn length (predicted awn length gene1, pre-awn1), annotated as a bHLH protein in a strong association signal peak with 59 associated markers, at chr7: 21.36–21.61 M (Fig. 5b, Table 6).

Fig. 5: Regions of the genome showing association signals near previously-identified genes

(a) Awn length signal peak 1. (b) Awn length signal peak 3. (c) Grain color signal peak 2. (d) Grain shape signal peak 1

The top of each panel shows a region on each side of the peak SNP, the position of which is indicated by a vertical blue line. Negative log 10-transformed P-values from the compressed GLM are plotted on the vertical axis; axis scales are slightly different across panels. Black horizontal dashed lines indicate the genome-wide significance threshold with a P-value=6. The bottom of each panel shows a region on each side of the SNP signal peak (blue dashed lines). Previously identified genes are indicated by red dashed lines or green dashed lines

In terms of grain color, an association signal peak with 11 associated markers was detected at position 27.38–28.17 Mb on chromosome 4 (Table 6). This is consistent with the Kala4 gene (Tetsuo et al. 2015) and another two genes annotated to encode anthocyanin regulatory Lc proteins (predicted anthocyanin regulatory gene 1 and 2, pre-anth1 and pre-anth2), which were predicted to regulate anthocyanin biosynthesis (Fig. 5c).

We also identified an association signal with four markers at the region at position 25.37–25.63 Mb on chromosome 7 (Fig. 5d, Table 6) corresponding to grain shape. This is consistent with a grain shape-related gene DEP2, which was reported to regulate grain size (Abe et al. 2010). Additionally, this association signal was also close to the region between Indel1 and RM21945 corresponding to the reported location of the grain length-related gene qGL7-2 (Shao et al. 2010).

Discussion

SLAF and SNP density is crucial for evolutionary and GWAS analysis. One previous study obtained 144,754 high-quality SLAFs and 30,578 polymorphic SLAFs in soybean, and a total of 8,738 SNPs were selected for gene mapping (Yang et al. 2020); another study obtained 374,265 SLAFs, 56,295 polymorphic SLAFs, and 102,025 SNPs in Ammopiptanthus mongolicus (Duan et al. 2019); 433,679 high-quality SLAFs and 29,075 polymorphic SLAFs were developed in peanut (Hu et al. 2018); and in rice, 56,768 SLAFs and 211,818 SNPs were obtained in one study (Yang et al. 2018), and 50,330 SLAFs were obtained in another (Li et al. 2016). In our study, a total of 178,287,776 reads, 314,065 SLAFs, 165,922 polymorphic SLAFs, and 571,521 evenly-distributed SNPs were developed from 60 Qiandongnan rice landraces, providing a strong foundation for the follow-up analysis.

The Qiandongnan rice landraces were divided into two groups. The japonica group contained 51 closely clustered individuals (Fig. 3) with a heterozygosity rate of 0.062. Thus, it can be considered that the sources of these germplasms may be relatively uniform. However, the nine individuals from the indica group were not closely clustered (Fig. 3) and had a heterozygosity rate of 0.073, suggesting that the sources of these germplasms were distantly related. This is consistent with the idea that indica rice was developed from crosses between japonica rice and local wild rice (Huang et al. 2012). Additionally, the japonica group comprised more members than the indica group, which is similar to the current distribution of japonica and indica rice (Lu et al. 2009). Thus, two possibilities exist: the japonica and indica groups of the Qiandongnan landraces likely evolved locally from their wild rice ancestors during a long-term domestication process and gradually formed the Qiandongnan landrace populations, or alternatively, differentiated landraces were introduced directly to the region. The “Gao qian xiang” and “Gong gui he” landraces were genetically close to the wild rice “Hai nan ye jing” and “Hai nan ye xian”, which clustered between Nipponbare rice and wild rice (Fig. 4). This suggests they may be relatively primitive germplasms. We suspect that the “Gao qian xiang” and “Gong gui he” landraces might be intermediate rice types between wild rice and japonica rice.

The results from the admixture groupings, MEGA5 clustering analysis, and cluster PCA enabled the classification of cultivated rice landraces in Guizhou, China, into japonica or indica groups (Fig. 3 and 4) (Saitou and Nei 1987; Price et al. 2006; Shaun et al. 2007; Alexander et al. 2009; Tamura 2011). These classifications were consistent with the “He” and “Gu” groupings (Table S1) as well as with the SNP sites for the japonica and indica seed-shattering gene qSH1 (Saeko et al. 2006), which was associated with rice domestication (Table 2). It was previously reported that SNP sites between Nipponbare and 93-11 rice may lead to altered seed shattering (Saeko et al. 2006; Zhang et al. 2009; Magwa et al. 2016; Sheng et al. 2019). Our findings suggested that the SNPs in qSH1 may have important consequences for the domestication (the trait of seed shattering) of Qiandongnan rice landraces. Thus, we suggest that the qSH1 gene can be used as a marker for investigations of other Qiandongnan rice landraces. Further, we hypothesize that rice growers in the Qiandongnan area classified their rice landraces into “He” and “Gu” based on the resistance to seed shattering over a long period of rice cultivation and the qSH1 gene may have played an important role in this domestication event.

Kala4 activates the expression of genes upstream of the flavonol biosynthesis pathway, including the genes encoding chalcone synthase and dihydroflavonol 4-reductase. It also induces the expression of downstream genes, such as those for leucoanthocyanidin reductase and leucoanthocyanidin dioxygenase, to produce a particular pigment. A previous study revealed that the sequence differences from -11 kb to-83 kb upstream of Kala4 were responsible for rice seed color variability (Tetsuo et al. 2015). Interestingly, we discovered that the other two predicted genes pre-anth1 and pre-anth2 that regulate anthocyanin were close to Kala4. These genes are encoded by a putative anthocyanin regulatory LC protein (Fig. 4b). We hypothesized that these two genes may form a gene cluster with Kala4 and collectively help regulate rice seed color.

Our GWAS results for awn length and grain shape (Fig. 4) suggest that these traits are regulated by An10 and qGL7-2, respectively (Matsushita et al. 2003). These findings may provide relevant information for the discovery of An10 and GL7-2 genes. Additionally, some GWAS regions that generated strong association peaks were detected for the first time. No related genes have been reported in these regions, such as Chr11: 21.24–22.97 M, associated with awn length traits (Table 6, with 13 associated markers). We believe these association results will be helpful for the identification of genes responsible for specific traits in rice. However, gene cloning studies are needed to confirm our results.

Conclusion

This study provided a foundation for gene cloning and molecular marker-assisted breeding of these materials and demonstrates the utility of SLAF-seq in a relatively isolated mountainous area.

Acknowledgments

This work was supported by grants from the Genetically Modified Organisms Breeding Major Projects of China [2016ZX08010003], the Natural Science Foundation of Guizhou (20181044), The Young Scholars and Technology Talents Development Project of Guizhou Education Department, the Science and Technology Project of Guizhou Province [20171039], and the Construction Program of Biology First-class Discipline in Guizhou (GNYL[2017] 009).

Author Contributions

Degang Zhao and Yan Li designed the experiment and analyzed the data; Yan Li, Xiaofang Zeng, Guangzheng Li, Yi Chen Zhao and Jianrong Li performed the experiments; Yan Li wrote the manuscript.

Conflict of Interest

The authors have no conflict of interests to declare

References

Abe Y, K Mieda, T Ando, I Kono, M Yano, H Kitano, Y Iwasaki (2010). The small and round seed1 (SRS1/DEP2) gene is involved in the regulation of seed size in rice. Genes Genet Syst 85:327‒339

Ain QU, A Rasheed, A Anwar, T Mahmood, M Imtiaz, T Mahmood, X Xia, Z He, U Quraishi (2015). Genome-wide association for grain yield under rainfed conditions in historical wheat cultivars from Pakistan. Front Plant Sci 6; Article 743

Alexander DH, J Novembre, K Lange (2009). Fast model-based estimation of ancestry in unrelated individuals. Genomics Res 19:308‒325

Bellucci A, AM Torp, S Bruun, J Magid, SB Andersen, SK Rasmussen (2015). Association mapping in scandinavian winter wheat for yield, plant height, and traits important for second-generation bioethanol production. Front Plant Sci 6; Article 1046

Beyer S, S Daba, P Tyagi, H Bockelman, M Mohammadi (2019). Loci and candidate genes controlling root traits in wheat seedlings—a wheat root GWAS. Funct Integr Genomics 19:91‒107

Bradbury PJ, Z Zhang, DE Kroon, TM Casstevens, Y Ramdoss, ES Buckler (2007). TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633‒2635

Chen W, J Yao, L Chu, Y Li, XM Guo, YS Zhang, J Jenkins (2014). The development of specific SNP markers for chromosome 14 in cotton using next-generation sequencing. Plant Breed 133:256‒261

Chen WB, YI Sato, I Nakamura, H Nakai (1992). Indica-japonica differentiation in Chinese rice landraces. Euphytica 74:195‒201

Cheng CY, M Reiko, T Suguru, F Yoshimichi, O Hisako, O Eiichi (2003). Polyphyletic origin of cultivated rice based on the interspersion pattern of SINEs. Mol Biol Evol 20:67‒75

Doyle J (1990). Isolation of plant DNA from fresh tissue. Focus 12:13–15

Duan YZ, JW Wang, ZY Du, FR Kang (2019). SNP sites developed by Specific Length Amplification Fragment Sequencing (SLAF-seq) and genetic analysis in Ammopitanthus mongolicus. Bull Bot Res 38:141–147

Fuller DQ, L Qin, Y Zheng, Z Zhao, XG Chen, Hosoya, L Aoi, GP Sun (2009). The domestication process and domestication rate in rice: Spikelet bases from the Lower Yangtze. Science 323:1607‒1610

Goff SA, R Darrell, L Tien-Hung (2002). A draft sequence of the rice genome (Oryza sativa L. sspp. japonica). Science 296:92‒100

Gu BG, TY Zhou, JH Luo, H Liu, YC Wang, YY Shangguan, JJ Zhu (2015). An-2 encodes a cytokinin synthesis enzyme that regulates awn length and grain production in rice. Mol Plant 8:1635‒1650

Hardy OJ, X Vekemans (2002). Spagedi: A versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes 2:618‒620

Hu XH, SZ Zhang, HR Miao, FG Cui, Y Shen, WQ Yang, TT Xu, N Chen, XY Chi, ZM Zhang, J Chen (2018). High-density genetic map construction and identification of QTLs controlling oleic and linoleic acid in peanut using SLAF-seq and SSRs. Sci Rep 8; Article 5479

Huang XH, N Kurata, XH Wei, ZX Wang, A Wang, Q Zhao, Y Zhao, KY Liu, HY Lu, WJ Li, YL Guo, YQ Lu, CC Zhou, DL Fan, QJ Weng, CR Zhu, T Huang, L Zhang, YC Wang, L Feng, H Furuumi, T Kubo, T Miyabayashi, XP Yuan, Q Xu, GJ Dong, QL Zhan, CY Li, A Fujiyama, A Toyoda, TT Lu, Q Feng, Q Qian, JY Li, B Han (2012). A map of rice genome variation reveals the origin of cultivated rice. Nature 490:497‒501

Huang XH, XH Wei, T Sang, Q Zhao, Q Feng, Y Zhao, CY Li, CR Zhu, TT Lu, ZW Zhang (2010). Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42:961‒967

Ikehashi H (2009). Why are There indica type and japonica type in rice, History of the studies and a view for origin of two types. Rice Sci 1:1‒13

Jiang X, WH Long, Q Peng, XC Zhang, XW Liu, DS Zhang, YL Gong, YS Zhang, SS Zhu (2018). Mapping of QTLs for heading date of rice based on SLAF_seq recombinant inbred lines population. Mol Plant Breed 16: 8068‒8072

Jyothi B, B Divya, LV Subba Rao, P Laxmi Bhavani, P Revathi, P Raghuveer Rao, B Rachana, G Padmavathi, J Aravind Kumar, C Gireesh, MS Anantha, R Abdul Fiyaz, C Suvarna Rani, ARG Ranganatha (2018). New plant type trait characterization and development of core set among Indica and tropical japonica genotypes of rice. Plant Genet Resour 16:504‒512

Li F, J Xie, X Zhu X Wang Y Zhao, X Ma, Z Zhang, MAR Rashid, Z Zhang, L Zhi, S Zhang, J Li, Z Li, H Zhang (2018). Genetic basis underlying correlations among growth duration and yield traits revealed by GWAS in rice (Oryza sativa L.). Front Plant Sci 9; Article 650

Li H, B Handsaker, A Wysoker, T Fennell, J Ruan (2009a). The sequence alignment-map format and SAMtools. Bioinformatics 25:2078‒2079

Li R, C Yu, Y Li, TW Lam, SM Yiu, K Kristiansen, J Wang (2009b). SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 25:1966‒1967

Li Y, XF Zeng, YC Zhao, JR Li, DG Zhao (2016). Identification of a new rice low-tiller mutant and association analyses based on the SLAF-seq method. Plant Mol Biol Rep 35:72‒82

Liu LQ, QL Luo, W Teng, B Li, HW Li, YW Li, ZS Li, Q Zheng (2018). Development of Thinopyrum ponticum-specific molecular markers and FISH probes based on SLAF-seq technology. Planta 247:1099‒1108

Lu BR, X Cai, X Jin (2009). Efficient indica and japonica rice identification based on the InDel molecular method: Its implication in rice breeding and evolutionary research. Prog Nat Sci 10:1241‒1252

Magwa RA, H Zhao, W Yao, W Xie, L Yang, Y Xing, X Bai (2016). Genome wide association analysis for awn length linked to the seed shattering gene qSH1 in rice. J Genet 95:639–646

Matsushita S, T Kurakazu, DK Sobrizal, A Yoshimura (2003). Mapping of genes for awn in rice using Oryza meridionalis introgression lines. Rice Genet Newslett 20:17–18

McKenna A, M Hanna, E Banks, A Sivachenko, K Cibulskis, A Kernytsky, K Garimella, D Altshuler, S Gabriel, M Daly, MA Depristo (2014). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genomics Res 20:1297‒1303

Price AL, NJ Patterson, RM Plenge, ME Weinblatt, NA Shadick, D Reich (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904‒911

Saeko K, I Takeshi, L Shao Yang, E Kaworu, F Yoshimichi (2006). An SNP caused loss of seed shattering during rice domestication. Science 312:1392‒1396

Saitou N, M Nei (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406‒425

Shao GN, SQ Tang, J Luo, GA Jiao, XJ Wei, A Tang, JL Wu, JY Zhuang, PS Hu (2010). Mapping of qGL7-2, a grain length QTL on chromosome 7 of rice. J Genet Genomics 37:523‒531

Shaun P, N Benjamin, TB Kathe, T Lori, MAR Ferreira, D Bender, J Maller, P Sklar, PIW de Bakker, MJ Daly, PC Sham (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. Amer J Hum Genet 81:559‒575

Shen YJ, H Jiang, JP Jin, ZB Zhang, B Xi, YY He, G Wang, C Wang, L Qian, X Li, QB Yu, HJ Liu, DH Chen, JH Gao, H Huang, TL Shi, ZN Yang (2004). Development of genome-wide DNA polymorphism database for map-based cloning of rice genes. Plant Physiol 135:1198‒1205

Sheng XB, XF Wang, YN Tan, ZZ Sun, D Yu, GL Yuan, DY Yuan, MJ Duan (2019). Construction of qSH1 mutants in rice (Oryza sativa) using CRISPR/Cas9 and characteristic analysis of mutagenesis. J Agric Biotechnol 27:212‒221

Song ZP, WY Zhu, J Rong, X Xu, JK Chen, BR Lu (2006). Evidences of introgression from cultivated rice to Oryza rufipogon (Poaceae) populations based on SSR fingerprinting: Implications for wild rice differentiation and conservation. Evol Ecol 20:501‒522

Sun X, D Liu, X Zhang, W Li, H Liu, W Hong, C Jiang, N Guan, C Ma, H Zeng, C Xu, J Song, L Huang, C Wang, J Shi, R Wang, X Zheng, C Lu, X Wang, H Zheng (2013). SLAF-seq: An efficient method of large-scale de novo SNP discovery and genotyping using high-throughput sequencing. PLoS One 8; Article e58700

Tamura K (2011). MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and macimum parsimony methods. Mol Biol Evol 28:2731‒2739

Tetsuo O, M Hiroaki, O Taichi, Y Takuya, T Noriko, E Kaworu, Y Masahiro, E Takeshi, I Takeshi (2015). The Birth of a Black Rice Gene and Its Local Spread by Introgression. Plant Cell 27:2401‒2414

Togninalli M, U Seren, JA Freudenthal, JG Monroe, DZ Meng, M Nordborg, D Weigel, K Borgwardt, A Korte, DG Grimm (2020). AraPheno and the AraGWAS Catalog 2020: A major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana. Nucl Acids Res 48:1063‒1068

Toriba T, HU Hirano (2014). DROOPING LEAF and OsETTIN2 genes promote awn development in rice. Plant J 77:616‒626

Wang Y, J Li (2011). Branching in rice. Curr Opin Plant Biol 14:94‒99

Xia C, LL Chen, TZ Rong, R Li, Y Xiang, P Wang, CH Liu, XQ Dong, B Liu, D Zhao, RJ Wei, H Lan (2014). Identification of a new maize inflorescence meristem mutant and association analysis using SLAF-seq method. Euphytica 202:1‒10

Yang QH, HX Jin, Yu, XM Yu, XJ Fu, HJ Zhi, FJ Yuan (2020). Rapid identification of soybean resistance genes to soybean mosaic virus by SLAF-seq bulked segregant analysis. Plant Mol Biol Rep 38:666–675

Yang XH, XZ Xia, Y Zeng, BX Nong, ZQ Zhang, YY Wu, FQ Xiong, YX Zhang, HF Liang, GF Deng, DT Li (2018). Identification of candidate genes for gelatinization temperature, gel consistency and pericarp color by GWAS in rice based on SLAF-sequencing[J]. PLoS One 13:e0196690

Yuan J, X Wang, Y Zhao, NU Khan, Z Zhao, Y Zhang, X Wen, F Tang, F Wang, Z Li Z Li (2020). Genetic basis and identification of candidate genes for salt tolerance in rice by GWAS. Sci Rep 10; Article 9958

Zhang GQ (2020), Prospects of utilization of inter-subspecific heterosis between Indica and japonica rice. J Integr Agric 19:1‒10

Zhang LB, QH Zhu, ZQ Wu, RI Jeffrey, SG Brandon, S Ge, T Sang (2009). Selection on grain shattering genes and rates of rice domestication. New Phytol 184:708‒720

Zhao Z, H Gu, X Sheng, H Yu, J Wang, L Huang, D Wang (2016). Genome-wide single-nucleotide polymorphisms discovery and high-density genetic map construction in cauliflower using specific-locus amplified fragment sequencing. Front Plant Sci 7; Article 334

Zong Y, Z Chen, JB Innes, C Chen, Z Wang, H Wang (2007). Fire and flood management of coastal swamp enabled first rice paddy cultivation in east China. Nature 449:459‒462